AITopics | safety budget

Collaborating Authors

safety budget

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

debd0ae2083160397a22a4a8831c7230-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 09:02:26 GMT

ablation, constraint, safety budget, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Add feedback

EffectsofSafetyStateAugmentationon SafeExploration

Neural Information Processing SystemsFeb-12-2026, 09:02:22 GMT

There are still, however, some unsolved challenges for a successful deployment of RL such as efficient learning of constrained or safe Markov Decision Processes (MDPs) [4].

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: Africa > Togo (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Enhancing Safe Exploration Using Safety State Augmentation

Neural Information Processing SystemsDec-25-2025, 12:07:32 GMT

Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations - a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfied. The value of this state also serves as a distance toward constraint violation, while its initial value indicates the available safety budget. This idea allows us to derive policies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for RL) to reflect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest that simmering a safe algorithm can improve safety during training for both settings. We further show that Simmer can stabilize training and improve the performance of safe RL with average constraints.

enhancing safe exploration, name change, safety state augmentation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.84)

Add feedback

Appendices

Neural Information Processing SystemsAug-19-2025, 11:55:31 GMT

Note that this safe RL problem is less general than the standard formulation of safe RL. The authors introduce a teacher-student hierarchy. To learn the teacher's policy the following constraints are followed: a1 The unsafe set is contained in the intervention set D D The teacher learns when to intervene and to switch between different interventions. A1.2 RL with probability one constraints We have introduced the safety state to the environment as follows: s First, we discuss our design for the PI controller and discuss the necessary parts for it. The proportional part delivers brute force control by having a large control magnitude for large errors, but it is not effective if the instantaneous error values become small.

ablation, constraint, safety budget, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Add feedback

debd0ae2083160397a22a4a8831c7230-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 11:55:28 GMT

constraint, machine learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Country: Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Enhancing Safe Exploration Using Safety State Augmentation

Neural Information Processing SystemsJan-19-2025, 02:49:00 GMT

constraint, enhancing safe exploration, safety state augmentation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP

Mukherjee, Subhojyoti, Hanna, Josiah P., Nowak, Robert

arXiv.org Artificial IntelligenceJun-4-2024

In this paper, we study safe data collection for the purpose of policy evaluation in tabular Markov decision processes (MDPs). In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected cumulative reward it will obtain. Policy evaluation requires data and we are interested in the question of what \textit{behavior} policy should collect the data for the most accurate evaluation of the target policy. While prior work has considered behavior policy selection, in this paper, we additionally consider a safety constraint on the behavior policy. Namely, we assume there exists a known default policy that incurs a particular expected cost when run and we enforce that the cumulative cost of all behavior policies ran is better than a constant factor of the cost that would be incurred had we always run the default policy. We first show that there exists a class of intractable MDPs where no safe oracle algorithm with knowledge about problem parameters can efficiently collect data and satisfy the safety constraints. We then define the tractability condition for an MDP such that a safe oracle algorithm can efficiently collect data and using that we prove the first lower bound for this setting. We then introduce an algorithm SaVeR for this problem that approximates the safe oracle algorithm and bound the finite-sample mean squared error of the algorithm while ensuring it satisfies the safety constraint. Finally, we show in simulations that SaVeR produces low MSE policy evaluation while satisfying the safety constraint.

optimal data collection strategy, safe policy evaluation, safety constraint, (10 more...)

arXiv.org Artificial Intelligence

2406.02165

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

Xu, Haotian, Wang, Shengjie, Wang, Zhaolei, Zhang, Yunzhe, Zhuo, Qing, Gao, Yang, Zhang, Tao

arXiv.org Artificial IntelligenceJul-27-2023

Reinforcement learning (RL) has achieved promising results on most robotic control tasks. Safety of learning-based controllers is an essential notion of ensuring the effectiveness of the controllers. Current methods adopt whole consistency constraints during the training, thus resulting in inefficient exploration in the early stage. In this paper, we propose an algorithm named Constrained Policy Optimization with Extra Safety Budget (ESB-CPO) to strike a balance between the exploration efficiency and the constraints satisfaction. In the early stage, our method loosens the practical constraints of unsafe transitions (adding extra safety budget) with the aid of a new metric we propose. With the training process, the constraints in our optimization problem become tighter. Meanwhile, theoretical analysis and practical experiments demonstrate that our method gradually meets the cost limit's demand in the final training stage. When evaluated on Safety-Gym and Bullet-Safety-Gym benchmarks, our method has shown its advantages over baseline algorithms in terms of safety and optimality. Remarkably, our method gains remarkable performance improvement under the same cost limit compared with baselines.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2302.14339

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.34)

Add feedback

Effects of Safety State Augmentation on Safe Exploration

Sootla, Aivar, Cowen-Rivers, Alexander I., Wang, Jun, Ammar, Haitham Bou

arXiv.org Artificial IntelligenceOct-12-2022

Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfied. The value of this state also serves as a distance toward constraint violation, while its initial value indicates the available safety budget. This idea allows us to derive policies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for RL) to reflect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest that "simmering, a safe algorithm can improve safety during training for both settings. We further show that Simmer can stabilize training and improve the performance of safe RL with average constraints.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2206.02675

Country: Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.48)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.56)

Add feedback